Members
Overall Objectives
Research Program
New Software and Platforms
New Results
Partnerships and Cooperations
Dissemination
Bibliography
XML PDF e-pub
PDF e-Pub


Section: New Results

Sample selection for SVM learning on large data sets

Participants : Sonia Chaibi, Xavier Descombes, Eric Debreuve.

Support Vector Machines (SVM) represent a popular framework of supervised learning. However, it is not well adapted to large data sets since learning is performed by an optimization procedure involving the whole data set. Yet, in the end, only a small subset of the samples (the so-called support vectors) is retained for prediction. Of course, efficient algorithms exist. Still, it can be interesting to filter out as many samples as possible (the ones that will surely not be part of the support vectors) before initiating the learning procedure.

Sonia Chaibi, a PhD student from UBMA, Algeria, visited the team for a month to collaborate on this subject. The method relies on successive unsupervised sample clustering steps. After each clustering, the homogeneity of the clusters in terms of sample class assignment is used to decide which samples are unlikely to be close to the separation hyperplane (and hence unlikely to be selected as support vectors), and which samples are apparently close to this hyperplane. The former ones can be discarded, thus reducing greatly the number of samples to be processed by the SVM algorithm, while the latter ones are kept, preserving the precision of the separation hyperplane as much as possible.